Semi-automatic language model acquisition without large corpora

نویسندگان

  • Tomoyosi Akiba
  • Katsunobu Itou
چکیده

Statistical language models have gained a reputation as providing the overall performance for speech recognition, and so widely used in speech recognition systems today. The tasks to which statistical language models can be applied are, however, limited, because a large corpus is essential for the building of a statistical model, and the collection of a new corpus is a very costly task in terms of time and e ort. Thus, if our aim is to apply speech recognition to various tasks as required, we need a way of developing a new language model for a given task at a reasonable cost.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-automatic acquisition of domain-specific semantic structures

This paper describes a methodology for semi-automatic grammar induction from unannotated corpora belonging to a restricted domain. The grammar contains both semantic and syntactic structures, which are conducive towards language understanding. Our work aims to ameliorate the reliance of grammar development on expert handcrafting or the availability of annotated corpora. To strive for a reasonab...

متن کامل

Lexical Knowledge Acquisition from Corpora

The paper presents a computational environment to support developing a lexicon for natural language processing. The underlying idea of the environment is to utilize up-to-date language technologies to minimize both the human labor and the inconsistency that are unavoidable in manual compilation of a lexicon. The proposed computational environment enables an efcient construction of a consistent ...

متن کامل

Towards a Workbench for Acquisition of Domain Knowledge from Natural Language

In this paper we describe an architecture and functionality of main components of a workbench for an acquisition of domain knowledge from large text corpora. The workbench supports an incremental process of corpus analysis starting from a rough automatic extraction and organization of lexico-semantic regularities and ending with a computer supported analysis of extracted data and a semi-automat...

متن کامل

Contextual Meta-Knowledge Acquisition from Corpora

This paper looks at the area of automatic acquisition of meta-knowledge for the structuring of very large knowledge bases-(VLKB). It is argued that we will rediscover the need in Natural Language Processing (NLP) for such large knowledge bases and that one possible method for structuring them eeciently lies in association-based statistics gathered from corpora. The discussion sets out the aims ...

متن کامل

Deriving an Lfg from a Treebank Resource

High quality training corpora are crucial for statistical approaches to natural language processing. For probabilistic Lexical Functional Grammars (LFG-DOP, (Bod R. & Kaplan R. 1998)) significant corpora of texts associated with both c-structure and f-structure representations are required. This poses an important acquisition problem: manual construction is time-consuming and errorprone while s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000